Gene Onoltogy Analysis on day25 Ancestral

1. Environment Set Up

for (i in 1:length(params))
  print(paste('Parameter:', names(params)[i], ' - Value:', params[[i]], '- Class:', class(params[[i]])))
## [1] "Parameter: Dataset  - Value: CHD2_iPSCs_and_organoids_PublicRepo - Class: character"
## [1] "Parameter: SEFile  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day25/Output/Savings/day25CbO.SE_deseq2_AR.rds - Class: character"
## [1] "Parameter: DEAList  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day25/Output/Savings/day25CbO.DEAList_AR.rds - Class: character"
## [1] "Parameter: AR  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day25/Output/Savings/day25CbO.deseqARvsWT.rds - Class: character"
## [1] "Parameter: SavingFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day25/Output/Savings/ - Class: character"
## [1] "Parameter: FiguresFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day25/Output/Figures/ - Class: character"
## [1] "Parameter: FDRthr  - Value: 0.05 - Class: numeric"
## [1] "Parameter: logFCthr  - Value: 0.55 - Class: numeric"
## [1] "Parameter: TopGO  - Value: BP_MF_CC - Class: character"
## [1] "Parameter: GoEnTh  - Value: 1 - Class: numeric"
## [1] "Parameter: GoPvalTh  - Value: 0.05 - Class: numeric"
## [1] "Parameter: NbName  - Value: TopGO_day25_AR - Class: character"
## [1] "Parameter: SaveImages  - Value: FALSE - Class: logical"
  • Dataset: name of the dataset that is processed.
  • SE: input file containing differential expression results from DESeq2 (absolute path).
  • FDRthr: threshold on False Discovery Rate. Default 0.05.
  • logFCthr: threshold on False Discovery Rate. Default 1.5.
  • SavingFolder: directory where produced files will be written (absolute path). Default is getwd().
  • TopGO: string that specify the ontology domains to be analysed. Default BP and MF; also CC can be added.
library(RNASeqBulkExploratory)
library(SummarizedExperiment)
library(tidyr)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(topGO)
library(sechm)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
library(data.table)
## data.table 1.14.2 using 2 threads (see ?getDTthreads).  Latest news: r-datatable.com
## 
## Attaching package: 'data.table'
## The following object is masked from 'package:SummarizedExperiment':
## 
##     shift
## The following object is masked from 'package:GenomicRanges':
## 
##     shift
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:IRanges':
## 
##     shift
## The following objects are masked from 'package:S4Vectors':
## 
##     first, second
source('/group/testa/Users/oliviero.leonardi/myProjects/CHD2/BulkRNAseq/ContainerHome/CHD2_organoids/NoGradientBarplots.R')
Dataset <- params$Dataset
logFCthr <- params$logFCthr
FDRthr <- params$FDRthr

FdrTh <- FDRthr
logFcTh <- logFCthr

SavingFolder <- ifelse(is.null(params$SavingFolder), getwd(), params$SavingFolder)
FiguresFolder <- ifelse(is.null(params$FiguresFolder), getwd(), params$FiguresFolder)


if (dir.exists(SavingFolder) == FALSE) {
  dir.create(SavingFolder, recursive=TRUE)
}

2. Data Upload

  • Summarized Experiment object containing expression data used for DEA and gene and sample metadata
  • DEA object, containing results of the differential expression

2.1 Load Data from DEA

#SE object coming from DEA, but not containing specific contrast results
SE_DEA <- readRDS(params$SEFile)

SE_DEA <- SE_DEA[rowData(SE_DEA)$GeneName != '', ]
rownames(SE_DEA) <- rowData(SE_DEA)$GeneName

# List with differential expression results (all time-points)
DEA <- readRDS(params$DEAList)
colvector <- c("#5ec962", "#e95462", "#2c728e")
names(colvector) <- c('All', 'Up',  'Down')

2.2 Add DEA results to SE

if(! identical(rownames(SE_DEA), row.names(DEA$AR$res))){
  stop('Expression data in SE and results from differential espression analysis are inconsistent.')
}

SE_DEA <- mergeDeaSE(SE_DEA, DEA$AR$res, subsetRowDataCols=NULL,
                     logFcCol='log2FoldChange', FdrCol='padj') #specify
## Renaming " log2FoldChange " to "logFC"
## Renaming "  padj  " to "FDR"

17781 genes in 13 samples have been testes for differential expression.

The following number of genes are identified as differentially expressed:

  • FDR < 0.1: 268 differentially expressed genes
  • FDR < 0.05: 171 differentially expressed genes
  • FDR < 0.05 and FC > 1.5: 111 differentially expressed genes
  • FDR < 0.05 and FC > 2: 74 differentially expressed genes

Imposing a threshold of 0.55 on the Log2FC and 0.05 on the FDR (as specified in parameters), 171 genes are selected: 127 up-regulated genes and 126 down-regulated genes.


3. RESULTS NAVIGATION: Interactive Table

An interactive table show the results for all DEGs (ranked according to FDR). A table of all DEGs can be downloaded from here.

From topTag I generate an interactive table for result interrogation with link to the Gene Cards. The table reports all the genes having a FDR < 0.05 and a Log2FC > 0.55 as absolute value, according to the threshold settings.

DEGsTable(SE_DEA, FdrTh=FdrTh, logFcTh=logFCthr, maxGenes=Inf, saveDEGs=FALSE)

4. RESULTS VISUALIZATION

4.1 Volcano plot

The results of the differential expression analysis are visualized by Volcano plot. An interactive version is included in the html (only genes with FDR < threshold), while a static version is saved.

plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = FALSE)
## Warning: Removed 1379 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 17664 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).


plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = TRUE)

4.2 Heatmap for significant genes

Heatmaps for DEGs, showing scaled vst values.

DEGs <- dplyr::filter(data.frame(rowData(SE_DEA)), FDR < FDRthr & abs(logFC) > log2(logFCthr))
ScaledCols <- c('darkblue', "purple","white","lightgoldenrod1", 'goldenrod1')
# sechm::sechm(SE_DEA, genes=DEGs$GeneName, assayName="vst", gaps_at="Genotype", show_rownames=FALSE,
#       top_annotation=c('Genotype'), hmcols=ScaledCols, show_colnames=TRUE,
#       do.scale=TRUE, breaks=0.85, column_title = "Scaled Vst Values")

5. TOPGO for Gene Ontology Enrichment analysis

Gene ontology enrichment analysis is performed on the set of 171 genes using TopGO with Fisher statistics and weight01 algorithm.

For each specified domain of the ontology:

  • Enrichment analysis on all DEGs or splitted in down- and up-regulated

5.1 Selection of modulated genes and generation of gene vectors

I generate vectors for the gene universe, all modulated genes, up-regulated genes and down-regulated genes in the format required by TopGo.

GeneVectors <- topGOGeneVectors(SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr)
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1

Therefore:

  • universe genes: 17781 genes
  • modulated genes: 117 genes
  • down-regulated genes: 57 genes of interest
  • up-regulated genes: 60 genes of interest

Then I set parameters according to the gene ontology domains to be evaluated. By default, Biological Process and Molecular Function domains are interrogated.

BpEval <- ifelse(length(grep('BP', params$TopGO))!=0, TRUE, FALSE)
MfEval <- ifelse(length(grep('MF', params$TopGO))!=0, TRUE, FALSE)
CcEval <- ifelse(length(grep('CC', params$TopGO))!=0, TRUE, FALSE)

5.2 TopGO analysis: Biological Process

On the basis of the analysis settings, the enrichment for Biological Process IS performed.

Biological Process Analysis for ALL modulated genes: 117 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
BPannAR <- topGO::annFUN.org(whichOnto="BP", feasibleGenes=names(GeneVectors$DEGenes), 
                    mapping="org.Hs.eg.db", ID="symbol") %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResBPAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=BPannAR, ontology='BP', 
                         desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='BPAllAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11815 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15469 GO terms and 35188 relations. )
## 
## Annotating nodes ...............
##  ( 14087 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 2718 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 17:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 16:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 15:  13 nodes to be scored   (34 eliminated genes)
## 
##   Level 14:  24 nodes to be scored   (71 eliminated genes)
## 
##   Level 13:  54 nodes to be scored   (359 eliminated genes)
## 
##   Level 12:  89 nodes to be scored   (892 eliminated genes)
## 
##   Level 11:  157 nodes to be scored  (2916 eliminated genes)
## 
##   Level 10:  252 nodes to be scored  (4602 eliminated genes)
## 
##   Level 9:   339 nodes to be scored  (5923 eliminated genes)
## 
##   Level 8:   396 nodes to be scored  (7496 eliminated genes)
## 
##   Level 7:   439 nodes to be scored  (9306 eliminated genes)
## 
##   Level 6:   407 nodes to be scored  (11640 eliminated genes)
## 
##   Level 5:   291 nodes to be scored  (12754 eliminated genes)
## 
##   Level 4:   155 nodes to be scored  (13487 eliminated genes)
## 
##   Level 3:   75 nodes to be scored   (13744 eliminated genes)
## 
##   Level 2:   21 nodes to be scored   (13865 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13909 eliminated genes)

Biological Process Analysis for DOWN-REGULATED genes: 57 genes

# Wrapper function for topGO analysis (see helper file)
ResBPDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=BPannAR, ontology='BP', 
                          desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='BPDownAR', outDir=paste0(SavingFolder)) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11815 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15469 GO terms and 35188 relations. )
## 
## Annotating nodes ...............
##  ( 14087 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1566 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 14:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 13:  15 nodes to be scored   (40 eliminated genes)
## 
##   Level 12:  38 nodes to be scored   (309 eliminated genes)
## 
##   Level 11:  73 nodes to be scored   (2104 eliminated genes)
## 
##   Level 10:  123 nodes to be scored  (3888 eliminated genes)
## 
##   Level 9:   177 nodes to be scored  (5198 eliminated genes)
## 
##   Level 8:   223 nodes to be scored  (6658 eliminated genes)
## 
##   Level 7:   255 nodes to be scored  (8343 eliminated genes)
## 
##   Level 6:   263 nodes to be scored  (10823 eliminated genes)
## 
##   Level 5:   209 nodes to be scored  (12420 eliminated genes)
## 
##   Level 4:   113 nodes to be scored  (13397 eliminated genes)
## 
##   Level 3:   50 nodes to be scored   (13698 eliminated genes)
## 
##   Level 2:   18 nodes to be scored   (13853 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13898 eliminated genes)

# Selection on enrichment of at least 2 is implemented (also to avoid depleted categories). Then categories are ranked by PVal and all the ones with Pval < th are selected. If the number is < minTerms, othter terms are included to reach the minimum number. 
GOTable(ResBPDownAR$ResSel, maxGO=20)

Biological Process Analysis for UP-REGULATED genes: 60 genes

ResBPUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=BPannAR, ontology='BP', 
                        desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='BPUpAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11815 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15469 GO terms and 35188 relations. )
## 
## Annotating nodes ...............
##  ( 14087 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 2100 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 17:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 16:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 15:  12 nodes to be scored   (34 eliminated genes)
## 
##   Level 14:  19 nodes to be scored   (71 eliminated genes)
## 
##   Level 13:  43 nodes to be scored   (352 eliminated genes)
## 
##   Level 12:  63 nodes to be scored   (822 eliminated genes)
## 
##   Level 11:  107 nodes to be scored  (2748 eliminated genes)
## 
##   Level 10:  174 nodes to be scored  (4244 eliminated genes)
## 
##   Level 9:   236 nodes to be scored  (5359 eliminated genes)
## 
##   Level 8:   286 nodes to be scored  (6809 eliminated genes)
## 
##   Level 7:   337 nodes to be scored  (8217 eliminated genes)
## 
##   Level 6:   331 nodes to be scored  (10482 eliminated genes)
## 
##   Level 5:   252 nodes to be scored  (12291 eliminated genes)
## 
##   Level 4:   143 nodes to be scored  (13356 eliminated genes)
## 
##   Level 3:   71 nodes to be scored   (13712 eliminated genes)
## 
##   Level 2:   20 nodes to be scored   (13846 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13907 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/BPUp'), recursive=TRUE)
#GOAnnotation(ResBPUp$ResSel, GOdata=ResBPUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/BPUp'), keytype='SYMBOL')
GOTable(ResBPUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResBPAllAR$ResSel, TopGOResDown=ResBPDownAR$ResSel, TopGOResUp = ResBPUpAR$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResBPAllAR$ResSel, TopGOResDown = ResBPDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.3 TopGO analysis: Molecular Function

On the basis of the analysis settings, the enrichment for Molecular Function IS performed.

Molecular Function Enrichment for ALL modulated genes: 117 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
MFannAR <- topGO::annFUN.org(whichOnto='MF', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResMFAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=MFannAR, ontology='MF', 
                         desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='MFAllAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4195 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4656 GO terms and 6011 relations. )
## 
## Annotating nodes ...............
##  ( 14512 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 365 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  8 nodes to be scored    (9 eliminated genes)
## 
##   Level 9:   18 nodes to be scored   (71 eliminated genes)
## 
##   Level 8:   24 nodes to be scored   (1199 eliminated genes)
## 
##   Level 7:   41 nodes to be scored   (3226 eliminated genes)
## 
##   Level 6:   59 nodes to be scored   (3635 eliminated genes)
## 
##   Level 5:   84 nodes to be scored   (4701 eliminated genes)
## 
##   Level 4:   77 nodes to be scored   (6997 eliminated genes)
## 
##   Level 3:   34 nodes to be scored   (10418 eliminated genes)
## 
##   Level 2:   13 nodes to be scored   (11604 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14334 eliminated genes)

Molecular Function Enrichment for DOWN-REGULATED genes: 57 genes

ResMFDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=MFannAR, ontology='MF', 
                          desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='MFDownAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4195 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4656 GO terms and 6011 relations. )
## 
## Annotating nodes ...............
##  ( 14512 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 220 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  3 nodes to be scored    (4 eliminated genes)
## 
##   Level 9:   12 nodes to be scored   (50 eliminated genes)
## 
##   Level 8:   14 nodes to be scored   (1118 eliminated genes)
## 
##   Level 7:   25 nodes to be scored   (3079 eliminated genes)
## 
##   Level 6:   37 nodes to be scored   (3310 eliminated genes)
## 
##   Level 5:   49 nodes to be scored   (3859 eliminated genes)
## 
##   Level 4:   47 nodes to be scored   (6176 eliminated genes)
## 
##   Level 3:   22 nodes to be scored   (9612 eliminated genes)
## 
##   Level 2:   7 nodes to be scored    (11228 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14285 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFDown'), recursive=TRUE)
#GOAnnotation(ResMFDown$ResSel, GOdata=ResMFDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFDown'), keytype='SYMBOL')
GOTable(ResMFDownAR$ResSel, maxGO=20)

Molecular Function Analysis for UP-REGULATED genes: 60 genes

ResMFUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=MFannAR, ontology='MF', 
                        desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='MFUpAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4195 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4656 GO terms and 6011 relations. )
## 
## Annotating nodes ...............
##  ( 14512 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 242 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  6 nodes to be scored    (5 eliminated genes)
## 
##   Level 9:   9 nodes to be scored    (21 eliminated genes)
## 
##   Level 8:   13 nodes to be scored   (1138 eliminated genes)
## 
##   Level 7:   21 nodes to be scored   (1507 eliminated genes)
## 
##   Level 6:   34 nodes to be scored   (1877 eliminated genes)
## 
##   Level 5:   55 nodes to be scored   (2845 eliminated genes)
## 
##   Level 4:   56 nodes to be scored   (5226 eliminated genes)
## 
##   Level 3:   31 nodes to be scored   (9094 eliminated genes)
## 
##   Level 2:   13 nodes to be scored   (11377 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14325 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFUp'), recursive=TRUE)
#GOAnnotation(ResMFUp$ResSel, GOdata=ResMFUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFUp'), keytype='SYMBOL')
GOTable(ResMFUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResMFAllAR$ResSel, TopGOResDown=ResMFDownAR$ResSel, TopGOResUp=ResMFUpAR$ResSel, 
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResMFAllAR$ResSel, TopGOResDown = ResMFDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.4 TopGO analysis: Cellular Component

On the basis of the analysis settings, the enrichment for Cellular Component IS performed.

Cellular Component Enrichment for ALL modulated genes: 117 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
CCannAR <- topGO::annFUN.org(whichOnto='CC', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResCCAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=CCannAR, ontology='CC', 
                         desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='CCAllAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1707 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1926 GO terms and 3236 relations. )
## 
## Annotating nodes ...............
##  ( 14776 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 300 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  9 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  31 nodes to be scored   (0 eliminated genes)
## 
##   Level 9:   41 nodes to be scored   (374 eliminated genes)
## 
##   Level 8:   40 nodes to be scored   (1751 eliminated genes)
## 
##   Level 7:   42 nodes to be scored   (4207 eliminated genes)
## 
##   Level 6:   46 nodes to be scored   (8319 eliminated genes)
## 
##   Level 5:   38 nodes to be scored   (10227 eliminated genes)
## 
##   Level 4:   24 nodes to be scored   (12571 eliminated genes)
## 
##   Level 3:   26 nodes to be scored   (14072 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14531 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14694 eliminated genes)

#write.table(ResCCAll$ResAll, file=paste0(SavingFolder, 'TopGO/CCAllResults.txt'), sep='\t', row.names=FALSE)

Cellular Component Enrichment for DOWN-REGULATED genes: 57 genes

# Wrapper function for topGO analysis (see helper file)
ResCCDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=CCannAR, ontology='CC', 
                          desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='CCDownAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1707 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1926 GO terms and 3236 relations. )
## 
## Annotating nodes ...............
##  ( 14776 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 210 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  17 nodes to be scored   (0 eliminated genes)
## 
##   Level 9:   24 nodes to be scored   (243 eliminated genes)
## 
##   Level 8:   29 nodes to be scored   (1092 eliminated genes)
## 
##   Level 7:   32 nodes to be scored   (2771 eliminated genes)
## 
##   Level 6:   31 nodes to be scored   (7979 eliminated genes)
## 
##   Level 5:   29 nodes to be scored   (10024 eliminated genes)
## 
##   Level 4:   20 nodes to be scored   (12516 eliminated genes)
## 
##   Level 3:   21 nodes to be scored   (14051 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14531 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14694 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCDown'), recursive=TRUE)
#GOAnnotation(ResCCDown$ResSel, GOdata=ResCCDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCDown'), keytype='SYMBOL')
GOTable(ResCCDownAR$ResSel, maxGO=20)

Cellular Component Analysis for UP-REGULATED genes: 60 genes

# Wrapper function for topGO analysis (see helper file)
ResCCUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=CCannAR, ontology='CC', 
                        desc=NULL, nodeSize=4, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='CCUpAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1707 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1926 GO terms and 3236 relations. )
## 
## Annotating nodes ...............
##  ( 14776 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 246 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  20 nodes to be scored   (0 eliminated genes)
## 
##   Level 9:   32 nodes to be scored   (188 eliminated genes)
## 
##   Level 8:   30 nodes to be scored   (1290 eliminated genes)
## 
##   Level 7:   34 nodes to be scored   (3908 eliminated genes)
## 
##   Level 6:   38 nodes to be scored   (7868 eliminated genes)
## 
##   Level 5:   35 nodes to be scored   (9938 eliminated genes)
## 
##   Level 4:   23 nodes to be scored   (12536 eliminated genes)
## 
##   Level 3:   25 nodes to be scored   (14067 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14531 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14694 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCUp'), recursive=TRUE)
#GOAnnotation(ResCCUp$ResSel, GOdata=ResCCUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCUp'), keytype='SYMBOL')
GOTable(ResCCUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResCCAllAR$ResSel, TopGOResDown=ResCCDownAR$ResSel, TopGOResUp=ResCCUpAR$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResCCAllAR$ResSel, TopGOResDown = ResCCDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`


7. Savings

Most of the useful information has been saved during the analysis. Here I save figures, workspace and information about the session.

if (params$SaveImages == TRUE){   #Just in case since eval only works when knitting
  #Set the folder paths
  from <- paste(getwd(), paste(params$NbName, 'files/figure-html', sep='_'), sep='/')
  to <- params$FiguresFolder

  #Copy to output directory
  file.copy(from, to, recursive = TRUE, copy.mode = TRUE)
}
ResSelBP_AR <- list(ResSelAll = ResBPAllAR$ResSel,
                    ResSelUp = ResBPUpAR$ResSel,
                    ResSelDown = ResBPDownAR$ResSel)

ResSelMF_AR <- list(ResSelAll = ResMFAllAR$ResSel,
                    ResSelUp = ResMFUpAR$ResSel,
                    ResSelDown = ResMFDownAR$ResSel)

ResSelCC_AR <- list(ResSelAll = ResCCAllAR$ResSel,
                    ResSelUp = ResCCUpAR$ResSel,
                    ResSelDown = ResCCDownAR$ResSel)

ResSel_AR <- list(BP = ResSelBP_AR,
                  MF = ResSelMF_AR,
                  CC = ResSelCC_AR)

saveRDS(ResSel_AR, paste0(SavingFolder, '/day25CbO.', 'ResSel_AR.rds'))
SessionInfo <- sessionInfo()
Date <- date()

save.image(paste0(SavingFolder, '/day25CbO.', 'FunctionalAnalysisWorkspace_AR.RData'))

last update on: Fri Jan 10 20:41:55 2025